Getting Finished on a Sports Analytics Project

by Jack Davis and David Awosoga

2023-11-23

library(rgl)
## Warning: package 'rgl' was built under R version 4.3.2
library(knitr)
## Warning: package 'knitr' was built under R version 4.3.1
knitr::knit_hooks$set(webgl = hook_webgl)

Getting Finished on a Sports Analytics Project

Last time we talked about how to start a sports analytics project. Today we’re talking about how to finish one. Namely, what sort of deliverables could you do, and the basics of how to make them.

This talk comes to you in four parts: Deliverable options, Visualizations, Writing, and Reproducability

Deliverable options

Blog posts

Do it for fun and profit!

The profit isn’t going to come directly from ads or sales like it could from a more commercial, general audience blog. Instead, the entire blog is an advertisement for your skills and services. It’s a way to passively promote yourself, and a way to refer to or look at old projects when you’re not at your home computer.

It’s very good writing practice. After a while, you’ll find yourself writing better posts with less effort, and you can always go back and hide your oldest posts if you think they’re no longer optimally showing the world your skills.

“Although you can get ad revenue from blogging, the volume of people going to a stats blog usually isn’t worth much. The value you get is from showing off your expertise to the world.” – Éric Grenier http://www.threehundredeight.com/

Posters

Academic posters are like shorter versions of blog posts. Use the Better Poster R Markdown template to make one that is instantly readable.

https://betterposters.blogspot.com/

Industry Papers

Industry papers are extended versions of blog posts, which can be as long as academic posters, but without the formality or need to do a peer review.

https://www.sportlogiq.com/2020/08/07/elementor-2557/

arXiv

arXiv is a pre-publisher for getting research onto the internet more quickly than with a traditional academic publication.

https://arxiv.org/

Video Essays

Video essays are a great way to get a lot of eyeballs on your work, while also getting a trickle of ad money. You can make one from a collection of visualizations while verbally narrating the words. However, you need to either know video editing on top of the many other skills a sports analytics person needs, or you need a collaborator.

Dorktown/chart party does sports analytics video using Google Sattelite annotations.

Scorigami:

https://www.youtube.com/watch?v=9l5C8cGMueY

How to emulate this style:

https://www.youtube.com/watch?v=MfM7cqOlgds

Video Essays

Athletic interest does video essays on the business of sport.

https://www.youtube.com/watch?v=cBRNQMolTPw

https://athleticinterest.com/

Visualizations - Examples

library(ggplot2)
library(scatterplot3d)
library(plot3D)


library(mandelbrot)

wloo_cols = c("#ffffaa", "#ffea3d", "#ffd54f", "#e4b429", # yellow
              "#dfdfdf", "#a2a2a2", "#787878", "#000000", #grey to black
              "#ffbeef", "#ff63aa", "#df2498", "#c60078") #pink

wloo_cols_2 = wloo_cols[8:1]
wloo_cols_100_2 = c(colorRampPalette(wloo_cols[8:5], bias=0.25)(10),
                    colorRampPalette(wloo_cols[4:1])(90))
              

mb4 <- mandelbrot(xlim = c(-0.83310, -0.833055),
                 ylim = c(0.20575, 0.205795),
                 resolution = list(x = 1400, y = 800),
                 iterations = 1000)
                 
 
 
 df2 <- as.data.frame(mb4)
 
g <- ggplot(df2, aes(x = x, y = y, fill = value)) +
  geom_raster(interpolate = TRUE) + theme_void() +
  scale_fill_gradientn(colours = wloo_cols_100_2, guide = "none")
  
  plot(g)

Visualizations - Examples

From Data to Viz (previously the GGplot gallery) has a lot of generic GGplot material for you to copy/paste and modify

ggplot gallery: https://www.data-to-viz.com/

Visualizations - Examples

Hockeyviz gives detailed visualizations of each NHL game. Patreon members get more details and get them sooner.

Hockeyviz: https://www.hockeyviz.com/ , https://www.hockeyviz.com/game/2023020243

(Excerpt from Stat 442 - Data Visualization) rgl, webGL, and OpenGL

rgl is the R interface to the OpenGL library. (See: https://en.wikipedia.org/wiki/OpenGL)

We are using a “hook” in knitr to use WebGL, which is OpenGL for webpages, which is why, among other reasons, these notes are going to be in HTML slidy format.

See: https://bookdown.org/yihui/rmarkdown-cookbook/rgl-3d.html for details on this hook.

webgl hook

The R code block on this slide has webgl set to TRUE to allow for embedding. It won’t work in the R studio preview viewer that pops up after you knit something, but it WILL work if you load the slides in something like Firefox. (Or Chrome, I guess..)

{r, webgl=TRUE}

If everything works, you should get a 3d scatterplot (made with rgl’s plot3d() function. Note the lower case d, where the dots are drawn in a rainbow arranged along the x-axis. You should be able to rotate the image by clicking and dragging.

(The lower case d is important because there is also a plot3D with an uppercase D that comes from the plot3D package)

x <- sort(rnorm(1000))
y <- rnorm(1000)
z <- rnorm(1000) + atan2(x,y)
plot3d(x, y, z, col = rainbow(1000))

webgl hook

Some 3D plot functions in other packages can also use the rgl package to make interactive graphs. The scatter3d function in the car package can use it to make interactive plots.

This r code block also has webgl=TRUE.

library(car)
scatter3d(x = trees$Girth, y = trees$Height, z = trees$Volume)

webgl and rgl

For more information on using rgl, see the following documentation:

http://www.sthda.com/english/wiki/a-complete-guide-to-3d-visualization-device-system-in-r-r-software-and-data-visualization

Writing - General Advice

  1. You first few things will be terrible, and that’s fine.
  1. Choose an audience

Writing - General Advice

  1. Maximize \(\Delta\)(Understanding) within your audience.
  1. Write drunk, edit sober.

Writing - General Advice

  1. Use the skeleton method

Papers are big and intimidating to write, and imagining everything involving in the writing of one is pretty much impossible, at least for modern papers. Instead, it’s much easier to think about and write small parts of a paper at a time, and then do any necessary synthesis at the end.

Writing - Sources

The Chicago Guide to Writing about Multivariate Analysis has some good discussions on describing statistical results, as well as the principles of tables and visualizations.

https://press.uchicago.edu/ucp/books/book/chicago/C/bo15506942.html

Writing - Sources

The Chicago Guide to Writing About Numbers is a shorter, more general version of “Multivariate Analysis”.

https://press.uchicago.edu/ucp/books/book/chicago/C/bo19910133.html

Writing - Examples

Your best approach may be to find writing examples that both good and easy to emulate. Visualizing baseball is a short, easy, read on baseball analytics with lots of ggplot output.

https://www.goodreads.com/book/show/57496821-visualizing-baseball

Writing - Examples

Anything from the Hockey Abstract is equally good. It’s a collection of short analyses of NHL hockey. Itself being an emulation of the much older Baseball Abstract series.

http://www.hockeyabstract.com/statshot

Writing - Examples

Other honourable mentions for good books to read for inspiration and examples of what you could do:

Basketball Data Science in R (more technical, closely linked to companion package) https://www.routledge.com/Basketball-Data-Science-With-Applications-in-R/Zuccolotto-Manisera/p/book/9781138600799

Squares & Sharps, Suckers & Sharks / Monte Carlo or Bust (Gambling, less technical, but still rigourous) https://www.goodreads.com/book/show/30167627-squares-and-sharps-suckers-and-sharks

Soccernomics (Even less technical, general interest but good example for talking about business) https://www.goodreads.com/book/show/6617185-soccernomics